Research Question: What are the determinants of COPD mortality in North Carolina? Are rural counties really affected at a disproportionate rate?
Chronic Obsructive Pulmonary Disease (COPD) is the 6th leading cause of death in the U.S. The disorder is the cause of more than 140,000 deaths per year. According to the CDC, COPD is more than two times as common in rural areas in large cities. How does this manifest itself in North Carolina?
Determinants of pulmonary health may include things like air quality, heat, and smoking rates. Factors such as heat exacerbate the effects of bad air quality, and other threats to pulmonary health can include health insurance coverage. Air quality and heat effects are often worse in urban areas, so why is COPD not worse in urban areas than rural areas? The CDCs own data will be explored to see if it is really the case the rural areas still suffer more from COPD, even though it may be true that urban areas are associated with heat island effects and poorer air quality.
The majority of Data was found from the CDC. This includes the outcome variable of COPD mortality. Insurance and smoking data is from Count Health rankings & Road maps, a program of the University of Wisconsin Population Health Institute. This data not used for presentation purposes or for data exploration because it was not offered by the CDC. In the end this data was found and used to fill in gaps in the final analysis and methods for the research.
-CDC data on hospitalizations for asthma -CDC data on extreme heat -CDC data on COPD mortality https://www.cdc.gov/copd/data-and-statistics/county-estimates.html
-EPA data on AQI https://www.epa.gov/outdoor-air-quality-data/air-data-daily-air-quality-tracker
-NCHS data on urban rural classification schemes https://www.cdc.gov/nchs/data_access/urban_rural.htm
A correlation plot is used to test the strength of correlation between many of the determinants of interest against each other. Factors tested include year, urban_level, annual COPD deaths per 100,000, annual asthma ED visits, median AQI, and the number of extreme heat days. The highest correlations were between annual asthma ED visits for the county and urban level, median AQI and year, and extreme heat days and urban level. This correlation revealed a problem for our data as it is. Annual asthma ED visits appears to be a less reliable health outcome than annual COPD mortality for the counties. This is likely because asthma ED visits is a count that is not adjusted for population size. This is a problem for urban level because it gives an association with urban areas and asthma when it is really just growth in population size in urban areas versus rural areas that is causing the correlation.
Median AQI was mapped over time, using counties as different colored lines to see the extremes for various counties and years. The legend for counties is not included because there are too many counties, but the chart is still useful to watch the trends overtime.
Another colorful plot displays the change in annual COPD deaths overtime. This graph indicates that seasonality is at play in COPD deaths. We can tell because of the spikes that happen once a year. Being in winter weather exacerbates COPD issues seasonally. This is an intersting find becuase our initial hypothesis is that hot weather exacerbates pulmonary symptoms and makes breathing more difficult when combined with poor air quality.
Asthma deaths were investigated over time but not used for any more analysis after this investigation. They are clearly spread all over the place and not normalized in any way to fit county populaitons. We decided to stick with one health come as the reprsentative for pulmonary heath; COPD.
Annual number of extreme heat days was graphed over time. The decreasing trend in general was surprising.
A step AIC was created along with a set of diagnostics plots to assess the performance of the potential model. The step AIC found the strongest model to be one that includes all of the following variables: Year Median AQI Urban Level Count Of Extreme Heat Days
Year was chosen to remain in the model as it appeared to be a determinant in earlier data exploration. COPD death is not constant over the years, so the effects of the year were included.
The residuals vs. fitted values plot helps to detect non-linearity and find outliers. The random scatter of points without patterns or trends indicates a likely linear relationship.
The normal Q-Q plot shows that points fall on a relatively straight line. There are some deviations that depart from normality, but the trend of the points is linear in nature.
The scale-location plot indicates that the data has homoscedasticity. The variance is constant.
The residuals vs leverage plot shwos that there are some influential points, but none cause serious concern for removal.
AIC_COPD <- lm(data = heat_health, annual_COPD_deaths_per100000 ~ urban_level +
Median.AQI + Count_ExHeat_Days + Year)
step(AIC_COPD)
## Start: AIC=2462.12
## annual_COPD_deaths_per100000 ~ urban_level + Median.AQI + Count_ExHeat_Days +
## Year
##
## Df Sum of Sq RSS AIC
## - urban_level 1 7.9 191111 2460.1
## - Year 1 159.3 191262 2460.4
## <none> 191103 2462.1
## - Median.AQI 1 5512.5 196615 2471.4
## - Count_ExHeat_Days 1 18639.9 209743 2497.1
##
## Step: AIC=2460.14
## annual_COPD_deaths_per100000 ~ Median.AQI + Count_ExHeat_Days +
## Year
##
## Df Sum of Sq RSS AIC
## - Year 1 155.9 191267 2458.5
## <none> 191111 2460.1
## - Median.AQI 1 5731.8 196843 2469.9
## - Count_ExHeat_Days 1 19862.9 210974 2497.4
##
## Step: AIC=2458.46
## annual_COPD_deaths_per100000 ~ Median.AQI + Count_ExHeat_Days
##
## Df Sum of Sq RSS AIC
## <none> 191267 2458.5
## - Median.AQI 1 7145 198412 2471.0
## - Count_ExHeat_Days 1 19753 211019 2495.5
##
## Call:
## lm(formula = annual_COPD_deaths_per100000 ~ Median.AQI + Count_ExHeat_Days,
## data = heat_health)
##
## Coefficients:
## (Intercept) Median.AQI Count_ExHeat_Days
## 56.2588 0.7417 -0.2704
plot(AIC_COPD)
Several stages of analysis take place. The first is an OLS analysis that looks at counties in which all of the data is available. This OLS analysis looks at the years from 2008 and 2020, the same ones investigated in the data exploration. The second stage of the analysis is a spatial analysis. The spatial analysis focuses on determinants in the year 2020. The maps inspired a third stage of analysis. This stage looks at some extra determinants such as health insurance and smoking data.
Urban level is regressed on COPD mortality to check the direction of the relationship. This auxiliary regression shows that increases in urban level move COPD mortality upwards. This means the higher the urban level, the more rural the area. Extreme heat days is regressed on COPD morality to check the direction of the relationship. The regression shows that increasing extreme heat days lowers COPD mortality on average. Annual median air quality index is regressed on COPD mortality to check the direction of the relationship. The regression shows that the COPD deaths decrease slightly when median AQI increases.
The main regression for this stage of analysis includes urban level, median AQI, count of extreme heat days, and year, all together.
The results show that all of the variables except for year are significant at less than a .01 P value. Urban level, as one of our main determinants of interest, can be ineterpreted as follows: For every 1 unit increase in urban level, annual COPD mortality increases by 1.993 on average, holding all else constant.
aux1_urban <- lm(data = heat_health, annual_COPD_deaths_per100000 ~ urban_level)
aux2_heat <- lm(data = heat_health, annual_COPD_deaths_per100000 ~ Count_ExHeat_Days)
aux3_aqi <- lm(data = heat_health, annual_COPD_deaths_per100000 ~ Median.AQI)
summary(aux1_urban)
##
## Call:
## lm(formula = annual_COPD_deaths_per100000 ~ urban_level, data = heat_health)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.120 -17.994 -1.894 15.553 89.353
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.1672 3.4677 21.388 <2e-16 ***
## urban_level 0.8133 0.7912 1.028 0.305
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.18 on 395 degrees of freedom
## Multiple R-squared: 0.002667, Adjusted R-squared: 0.0001424
## F-statistic: 1.056 on 1 and 395 DF, p-value: 0.3047
summary(aux2_heat)
##
## Call:
## lm(formula = annual_COPD_deaths_per100000 ~ Count_ExHeat_Days,
## data = heat_health)
##
## Residuals:
## Min 1Q Median 3Q Max
## -45.773 -17.633 -1.909 15.530 84.291
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 84.10928 1.66510 50.513 < 2e-16 ***
## Count_ExHeat_Days -0.21978 0.04098 -5.363 1.39e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.41 on 395 degrees of freedom
## Multiple R-squared: 0.06788, Adjusted R-squared: 0.06552
## F-statistic: 28.76 on 1 and 395 DF, p-value: 1.394e-07
summary(aux3_aqi)
##
## Call:
## lm(formula = annual_COPD_deaths_per100000 ~ Median.AQI, data = heat_health)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.969 -17.458 -1.996 14.589 90.731
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 63.3582 7.7188 8.208 3.21e-15 ***
## Median.AQI 0.3578 0.1927 1.856 0.0641 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.11 on 395 degrees of freedom
## Multiple R-squared: 0.008649, Adjusted R-squared: 0.006139
## F-statistic: 3.446 on 1 and 395 DF, p-value: 0.06414
mv_reg1<- lm(data = heat_health, annual_COPD_deaths_per100000 ~ urban_level +
Median.AQI + Count_ExHeat_Days + Year)
summary(mv_reg1)
##
## Call:
## lm(formula = annual_COPD_deaths_per100000 ~ urban_level + Median.AQI +
## Count_ExHeat_Days + Year, data = heat_health)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.414 -17.000 -3.246 16.118 84.133
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 432.46153 657.30086 0.658 0.510966
## urban_level -0.10230 0.80516 -0.127 0.898958
## Median.AQI 0.69918 0.20792 3.363 0.000848 ***
## Count_ExHeat_Days -0.27523 0.04451 -6.183 1.58e-09 ***
## Year -0.18571 0.32484 -0.572 0.567857
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.08 on 392 degrees of freedom
## Multiple R-squared: 0.1022, Adjusted R-squared: 0.09305
## F-statistic: 11.16 on 4 and 392 DF, p-value: 1.395e-08
A spatial analysis takes place. These maps help explore some of the trends we saw in our yearly data that was used for the previous OLS. We now take the values for the years 2021 and map them on North Carolina. The shapefile used is from the spatial shape files in the ENVIORN872 course.
## Reading layer `cb_2018_us_county_20m' from data source
## `/Users/mac/Documents/HannahQuinn872/Raw/Spatial/cb_2018_us_county_20m.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 3220 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS: NAD83
Mapview is used to visualize all of the counties in North Carolina.
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
1 Represents a large central metro (red) 2 Represents a large fringe metro (orange) 3 Represents a medium metro (yellow) 4 Represents a small metro (green) 5 Represents a micropolitan area (blue) 6 Represents a noncore area (violet)
The most urban and least urban parts of NC are visualized. The county types are spread out, but there is a concentration of the most rural areas in the western part of NC.
Air quality is mapped for 2021. Rural areas in the west seem to be frequently associated with lower air quality. This was a surprise to us, as we thought urban areas would constantly have the higher AQI.
Air quality is mapped for 2011. We decided to map values for 2011 in an attempt to separate the AQI outcomes from the COVID pandemic when they may have dropped unusually low. 2011 had much higher AQIs in general. Western, rural areas still suffered from high AQI.
## New names:
## New names:
## • `` -> `...8`
The final map created is COPD mortality in NC. This map uses a sequential color change to show ares that suffered from higher annual COPD mortality. These areas match up with many of the more rural, western areas that have been flagged by previous determinants of COPD. Rural southern counties like Richmond and Scotland county had relatively high COPD mortality as well. The leader in COPD moratlity was Northwestern NC county Alleghany county.
#Stage 3 of the analysis is more OLS analysis including smoking and health insurance for 2021
In a final bit of analysis, we attempted to see if smoking and health insurance contributed COPD mortality in 2021. We did not use median AQI or exterme heat days for this analysis since it was only one year of analysis and there was not enough data from these variables (not enough counties recorded the info in 2021).
The results of this analysis show that the direction of the relationships between percent of population that smokes, perent of population that is uninsured, and COPD mortality makes sense. Higher rates of smoking and non-insurance are leading to higher rates of COPD mortality in these areas. The AIC shows that when only accounting for this limited set of variables, level code becomes unimportant for the regression.
stage3 <- read_csv(here("Cleaned/Stage3.csv"), show_col_types = FALSE)
extra_COPD <- lm(data = stage3, copdval ~ lvlcode + PctSmoke + PctUninsured.x)
step(extra_COPD)
## Start: AIC=578.23
## copdval ~ lvlcode + PctSmoke + PctUninsured.x
##
## Df Sum of Sq RSS AIC
## - lvlcode 1 107.37 42899 576.46
## <none> 42792 578.23
## - PctUninsured.x 1 2150.69 44942 580.79
## - PctSmoke 1 2320.73 45112 581.14
##
## Step: AIC=576.46
## copdval ~ PctSmoke + PctUninsured.x
##
## Df Sum of Sq RSS AIC
## <none> 42899 576.46
## - PctUninsured.x 1 2599.1 45498 579.93
## - PctSmoke 1 3056.2 45955 580.86
##
## Call:
## lm(formula = copdval ~ PctSmoke + PctUninsured.x, data = stage3)
##
## Coefficients:
## (Intercept) PctSmoke PctUninsured.x
## -22.475 3.149 2.149
summary(extra_COPD)
##
## Call:
## lm(formula = copdval ~ lvlcode + PctSmoke + PctUninsured.x, data = stage3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.338 -17.028 -1.225 11.813 61.735
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.3367 21.5053 -0.946 0.3469
## lvlcode 0.8814 1.8652 0.473 0.6377
## PctSmoke 2.9291 1.3332 2.197 0.0306 *
## PctUninsured.x 2.0282 0.9590 2.115 0.0372 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.93 on 89 degrees of freedom
## Multiple R-squared: 0.1851, Adjusted R-squared: 0.1576
## F-statistic: 6.737 on 3 and 89 DF, p-value: 0.000379
Our data exploration and multi-stage analysis leads us to believe that urban areas do not experience COPD at as high of rates as rural areas in NC. Our analysis showed that higher heat that may be associated with urban heat island effects did not play a large role in increasing COPD deaths. Smoking and health insurance rates did play a large role, as well as urban or rural levels. This matches the findings of the CDC.
Areas of NC that are more rural have other factors of interest such as popoulations that work in mining, manufacturing, and farm work. These are areas of employment that expose employees to high levels of PM. Employment is a factor of interest that could show how different professions lead to different COPD outcomes.
In rural NC, people can be located far away form healthcare. For this reason, it is crucial for policy and advocacy to promote teleheatlh, virtual appointments, and phone visits for elderly rural populations that may not get regular health care because they are uninsured or located far away.